Keep Document Parts (Text Processing)
Synopsis
Extracts the text of a token that matches a given regular expression and returns it.Description
This operator allows to extract a part of a token using regular expressions. It searches the first region within the text that matches the given regular expression and returns this region as new token. If no such region can be found this token is discarded. Since this probably will work best when the tokens are long enough, this operator is especially useful before the actual tokenization is applied during word vector creation.
Input
- document
The document port.
Output
- document
The document port.
Parameters
- extraction_regexThis regular expression specifies the part of the string, which is extracted and returned. Range: